Deep Learning Paradigm with Transformed Monolingual Word Embeddings for Multilingual Sentiment Analysis
نویسندگان
چکیده
The surge of social media use brings huge demand of multilingual sentiment analysis (MSA) for unveiling cultural difference. So far, traditional methods resorted to machine translation—translating texts in other languages to English, and then adopt the methods once worked in English. However, this paradigm is conditioned by the quality of machine translation. In this paper, we propose a new deep learning paradigm to assimilate the differences between languages for MSA. We first pre-train monolingual word embeddings separately, then map word embeddings in different spaces into a shared embedding space, and then finally train a parameter-sharing deep neural network for MSA. The experimental results show that our paradigm is effective. Especially, our CNN model outperforms a state-of-the-art baseline by around 2.1% in terms of classification accuracy.
منابع مشابه
Deep Multilingual Correlation for Improved Word Embeddings
Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddings from two languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of ...
متن کاملBorrow a Little from your Rich Cousin: Using Embeddings and Polarities of English Words for Multilingual Sentiment Classification
In this paper, we provide a solution to multilingual sentiment classification using deep learning. Given input text in a language, we use word translation into English and then the embeddings of these English words to train a classifier. This projection into the English space plus word embeddings gives a simple and uniform framework for multilingual sentiment analysis. A novel idea is augmentat...
متن کاملWiktionary-Based Word Embeddings
Vectorial representations of words have grown remarkably popular in natural language processing and machine translation. The recent surge in deep learning-inspired methods for producing distributed representations has been widely noted even outside these fields. Existing representations are typically trained on large monolingual corpora using context-based prediction models. In this paper, we p...
متن کاملBeyond Bilingual: Multi-sense Word Embeddings using Multilingual Context
Word embeddings, which represent a word as a point in a vector space, have become ubiquitous to several NLP tasks. A recent line of work uses bilingual (two languages) corpora to learn a different vector for each sense of a word, by exploiting crosslingual signals to aid sense identification. We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by...
متن کاملUniPI at SemEval-2016 Task 4: Convolutional Neural Networks for Sentiment Classification
The paper describes our submission to the task on Sentiment Analysis on Twitter at SemEval 2016. The approach is based on a Deep Learning architecture using convolutional neural networks. The approach used only word embeddings as features. The submission used embeddings created from a corpus of news articles. We report on further experiments using embeddings built for a corpus of tweets as well...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1710.03203 شماره
صفحات -
تاریخ انتشار 2017